perm filename CHPT.7[LSP,JRA]1 blob sn#288892 filedate 1977-06-19 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00016 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	.SEC(Storage#Structures and#Efficiency,,,P210:)
C00008 00003	.SS(Bit-tables,,P263:)
C00011 00004	.SS(Vectors and Arrays,,P137:)
C00021 00005	A typical implementation of an array facility in LISP would include
C00024 00006	.SS(Strings and Linear LISP,string processor,P27:)
C00035 00007	.SS(A Compacting Collector for LISP,,P291:)
C00048 00008	.SS(Representations of Complex Data Structures,,P138:)
C00057 00009	.SS(%3rplaca%* and %3rplacd%*,,P171:)
C00066 00010	.SS(Applications of %3rplaca%* and %3rplacd%*,,P155:)
C00079 00011	.GROUP
C00089 00012	.SS(Stacks and Threading)
C00096 00013	.SS(A Non-recursive %3read%*)
C00107 00014	.SS(More Applications of Threading)
C00110 00015	.SS(Storage Management and LISP,,P224:)
C00126 00016	.SS(Hash Techniques,,P248:)
C00132 ENDMK
C⊗;
.SEC(Storage#Structures and#Efficiency,,,P210:)
.SS(Introduction)
.FP
Though it is true that any algorithm can be coded in terms of
manipulations of binary trees, there are many instances in which more
efficient organizations of data exist.
At the most elementary  level are vectors and arrays.   These
data structures could  certainly be stored in a list-structure format
and individual components accessed via %3car-cdr%* chains.  However, most
machines  have a  hardware  organization which  can  be exploited  to
increase   accessing   efficiency  over   the   list  representation.
Similarly, strings can  be represented as  lists of characters.   The
string processing operations are expressible as LISP algorithms.  But
again this is usually not the most reasonable representation. Even  at
the level of list-structure operations, simple binary trees might not
be the most  expeditious representation for every problem.  Also many
of the algorithms we  have presented in  LISP are overly wasteful  of
computation time.  This section  will begin an examination of
alternatives   to  LISP  organization.     

There  is   no  best  data
representation, nor is there a  best algorithm.  The chosen representation
must match the machine and  the problem domain being studied.  If
the problem  is strictly numerical, then list-structure is not the
answer; if simple string manipulation is sufficient,
then list processing is too general.  And there are many applications
of list processing which are so  sufficiently well#behaved that 
complex devices like garbage collectors are unnecessary.
The point  is that if you have seen the list-processing techniques in
a rich environment  such as  LISP and have  seen the applications  to
which  LISP may  be put,  then you  will be  prepared to  apply these
techniques  in  a  meaningful  way.    Many  times  a   quick
representation in  LISP  is all  that is  needed.   You  get a  clean
representation  with comprehensible algorithms.   Once you've studied
the algorithms, efficiencies might come to mind. At that  time either
recode the problem using some  of the obscene LISP programming tricks
or drop into machine language if very fine control is required.  Once
you have  %2a%*  representation it  is  easy to  get  better ones.    For
example,  once you have  a compiler  which is   correct  it is
easier to describe a smarter one.
		
This section will describe  representations other than LISP binary trees
and will show some ways of increasing the efficiency of LISP programs.

.SS(Bit-tables,,P263:)
.FP
In the marking phase of a garbage collector it is
necessary to  record the visitation of each word.
Frequently it is not possible to place a mark in the  actual word.
This  might occur for  several  reasons: 
.SKIP 1
.NL;
%21.%1##For words in  FS, there is  no
room if each  word contains exactly two addresses.
.NL;
%22.%1# The word is
in  FWS and  the  meaning of  the information  stored there  would be
changed if we set a bit.
.NL;
%23.%1# In structures, more complex than dotted pairs, there may not be room for
marking bits.
.NL; 
%24.%1# If a bit is assigned to each word, then  the initialize phase
requires that we visit each such word. This violates 
"locality of reference".⊗↓Locality  refers to the  relative distance between
memory locations assigned in a particular structure. In some machine organizations,
memory is divided into "pages" of a relatively small size. There is significant
overhead involved in crossing page boundaries. Therefore memory referencing
which entails many scattered references is said to violate "locality of
reference."← 
.pt24;
.FP
An alternative solution is to designate a separate section of memory called a
%2bit-table%1. The bit-table is a sequence of binary flags; and there
is a one-to-one correspondence between a flag and a markable location.
Whenever we wish to record the visiting of a word, we set the corresponding flag.
A bit table is represented as a sequence of machine locations with several
flags represented in each word. There is a simple calculation to relate a
bit in a word to its corresponding  markable location.
It is frequently faster to initialize a whole 
table rather than zero single bits in  separate words.
.SS(Vectors and Arrays,,P137:)
.BEGIN TABIT2 (10,40);CENTERIT;FILL;
.FP
%2Vectors%*.##Vectors, also known as  one-dimensional arrays,  are  usually
stored sequentially in memory.  Simple vectors are usually stored one
element to a  memory location though  this is not  a necessity.   For
example, a complex vector is very  naturally stored as pairs of cells;
or if  perhaps you would allow vectors of nonhomogeneous data modes,
each element would  have to include type  information.  In any  case,
most languages make some restrictions on the behavior of vectors such
that efficient accessing of elements can be made.  There is a natural
simulation of  a stack as  a (sequential) vector  with access  
 made via a global pointer to the vector.
.pt24;
.FP
%2Arrays%*.##Arrays are multi-dimensional data  structures.  Since
most  machine memories are linear  devices we must map  arrays onto a
linear representation.   We will  restrict attention to  the case  of
two-dimensional  arrays.   Most  of the  discussion generalizes  very
naturally.   A very common device is to store the array by rows; that
is,  each  row is stored sequentially - first, row 1; then  row 2,#...
and so on.
Given this representation there  is a trivial calculation to find the
location of an arbitrary element, A[i;j], if we know the location  of
the first  element  A[1;1] and  the extent of  the dimensions  of the
array.  For an array A[1:M; 1:N],
.EQ;
loc[A[i;j]] = loc [A[1;1]] + (i-1)*N + (j-1)
.pt18;
.FP
In languages like Fortran which require that the size of the array be
known at compile-time  the compiler can generate the accessing code
without problem.  Languages, like Algol 60  and some versions of LISP,
which allow the size of an  array to be determined at run-time require
a bit more care.  Algol 60  for example requires that you declare the
type (real, boolean,  etc.) of  the array and  specify the number  of
dimensions  in the  array,  but you  can postpone  until  run-time the
designation of the size of each dimension.  To handle this complexity
a dope vector is  introduced. 
A %2⊗→dope vector↔←%*  is typically a header or descriptor associated with
the area containing the actual array elements. The information in the dope vector
tells the functions  which access the array  how to treat the data.
Type and dimensionality are typical entries in dope vectors.

The compiler can determine  the size of
the dope vector, but  not the contents.  The dope vector is filled in
when the array declaration is executed 
and  contains information about the  actual extent of  the
array bounds.   Since  the size of the  array is not  known, the
compiler  cannot   allocate  space  for  the  array  elements.    The
allocation must also be  done at run-time.   When the array declaration  is
executed we must  know all the information about the  array.  At that
time we allocate space on the stack and complete the dope vector. Subsequent
references to array elements use the dope vector. When we leave the
block which  caused allocation of the array, the 
stack space is reclaimed.

Assume that
the  array elements are stored  by rows.  Look  at the calculation of
the location of element  A[i;j].   For specific execution of an  array
declaration much  of this information  is constant; the  location of
the array  elements, in particular, A[1;1] and the number of columns 
N are fixed.  
.Group;
Thus we rewrite the calculation as:
.PT18;
\###constant part\variable part

\[loc [A[1;1]]-N-1]#######+\######(i*N+j)
.pt18;
The constant  part is stored  in the  dope vector.   When we wish  to
reference  an element A[i;j] we  need only compute  the variable part
and add it to the constant part.
.APART

The dope vector for A [1:M; 1:N] perhaps might contain
.BOXA
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
   		   ⊂ααααααααααααααα⊃
		   ~      2        ~
		   εααααααπααααααααλ
		   ~   1  ~    M   ~
		   εααααααβααααααααλ
		   ~   1  ~    N   ~
		   εαααααα∀ααααααααλ
		   ~ constant part ~
		   εαααααααααααααααλ
		   ~    . . .      ~
		    array elements
		   ~    . . .      ~
		   %ααααααααααααααα$
.END
.BOXB
.FP
There is another scheme  for storing arrays which is  used in
some of  the Burroughs machines#(⊗↑[Org#71]↑,#⊗↑[Dor#76]↑). 
Each row  is stored sequentially and
access  to  separate  rows  is   made  through  a  device  called   a
%2⊗→mother-vector↔←%*.   The mother-vector is  a vector of pointers  to the
rows.  
.GROUP;
Thus,
.BOXA

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "[" FOR "\";
.TURN OFF "\";
.TABS 57,61;
.KRK
                                                  
       ⊂ααααααα⊃    ⊂αααααααααααααααααα     [αααα[⊃
       ~    #ααβααα→~ A∞411∞*  A∞412∞*       ...[A∞41N∞*[~
       εαααααααλ    %αααααααααααααααααα     [αααα[$
       ~    #ααβαα→  . . .
       εαααααααλ
          . . .

       εαααααααλ    ⊂αααααααααααααααααα     [αααα[⊃
       ~    #ααβααα→~ A∞4M1∞*  A∞4M2∞*       ...[A∞4MN∞*[~
       %ααααααα$    %αααααααααααααααααα     [αααα[$
.END
.BOXB;
.APART

Notice that the accessing computation is very  cheap.⊗↓However access
by array columns can be expensive. If each row is on a separate
page in the machine, the access overhead can be substantial.←   Another effect
is  that all rows  need not  be in memory  at once.   On a  paging or
segmenting machine 
this array organization can be helpful.  If an access to a row not in
core  is made, a "page#fault" is raised; the  monitor brings the row
into memory and the computation continues.   The mother-vector scheme
generalizes  nicely  to   multidimensionality  and  can  be  used  in
conjunction with a dope vector.

.END
A typical implementation of an array facility in LISP would include
a declaration:

.DEF
⊗→%3array%*↔←%3[%1<identifier>;<type>;<bounds>; ... ;<bounds>], where the
identifier names the array; the type could be numeric or S-expr; and  finally
a declaration of upper and lower bounds for each dimension would be needed.
%3array%* is a special form whose effect is to make the array name a %3SUBR%*,
whose code is the calculation of the dope vector. Thus,
.BOXA
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
              ⊂ααααααπαααα⊃    ⊂ααααααααααααα⊃
              ~ SUBR ~  #αβ→αα→~ dope vector ~
              %αααααα∀αααα$    ~ calculation ~
			       εαααααααααααααλ
			       ~    array    ~
			       ~  elements   ~
			       	    . . .
			       %ααααααααααααα$
.END
.BOXB
.FP
If we are to store S-exprs in the array, then the ⊗→garbage collector↔← must
be able to mark the entries. This is the reason for including type information.

When an array element is to be referenced, the subscripts are evaluated
(since the array name was declared as a %3SUBR%*) and the dope vector code
is executed. That computation results in a reference to the appropriate cell.

.GROUP;
We also must be able to store information in the array.
.DEF
%3store[%1<name>[<subscr>; ... ;<subscr>];<value>] : ⊗→%3store%*↔← is a special form
whose effect is to store the value of <value> in the designated array element.
.APART

.CENT(Problem)
.NL
1.##Implement a stack in LISP first using lists or dotted pairs, then using
an array. Include implementations of the stack operations.
.SS(Strings and Linear LISP,string processor,P27:)
.BEGIN TURN ON "←";
.FP
On {yon(P216)} we discussed one representation for LISP print names:
a linked list of full words, each containing part of the external
name for the atom. Print names are a special instance of a data structure
named strings, and our use so far in LISP of strings has been restricted to
manipulating string constants. In this section we will discuss alternative
representations for strings, and discuss components of a string-manipulating
language. 
The
organization  and  implementation  of   a  general  string  processor
directly parallel  LISP.  In fact a subset of LISP,
⊗→linear LISP↔←,  is a nice notation for string algorithms.

A %2string%1 is a sequence of characters.   A language
for string processing should allow the creation of strings
of arbitrary length at runtime; it should allow the generation of new
strings and the  decomposition of existing  strings.  If  strings of
arbitrary length are  to be created, an organization similar to FS in
LISP can be used with, perhaps, a string garbage collector.   We will
assume this most general case.
The machine memory will  contain at least a ⊗→string  space↔←, an
evaluator, a symbol table, and a garbage collector.

String space is a linear sequence of cells, each of which can
contain exactly  one character.   A string  will be  represented as  a
sequence of contiguous character cells.  A string variable will be an
entry in  the symbol table; the current value of the variable will be
represented as a pair: character count and a pointer to the beginning
of the character sequence.

.GROUP;
Thus,

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BOXA
.KRK
  ⊂αααπααα⊃
  ~ 3 ~ # ~
  %ααα∀αβα$
        ~
        %α→ααα→⊃
               ↓
    . . .ααααπαααπαααπαααπαααπαααπααα . . .
	       A   B   B   1   D   . . .    string space
    . . .αααα∀ααα∀ααα∀ααα∀ααα∀ααα∀ααα . . .
.BOXB
.END
encodes the string %3ABB%*.
.APART

.BEGIN TURN OFF"←";
We must make some decisions about how  we manipulate strings: when we
perform %3x ← y%*, do we copy the symbol table pair of %3y%* into that of %3x%*, or
do we make a new string isomorphic to %3y%* and point %3x%* at it?   It makes
a  difference.    We  will   choose  the  former,  copying  only  the
"descriptor,"  that  is,  we  will  share  strings  (and  substrings)
wherever possible. This decision makes the  storage requirements less
expensive, but will make our  lives more difficult when we worry about
⊗→garbage collection↔←.   There are  three  primitive functions: ⊗→%3first%*↔←,
⊗→%3rest%*↔←,  ⊗→%3concat%*↔←  (read: %3car%*,  %3cdr%*,   %*cons%*).
.END

.GROUP;
.DEF
%3first[x]%*  is  the first
character of the string represented by %3x%*.  %3first%* is undefined for the
empty string,#%7e%1. For example,

.EQ
%3first[ABC]%1 is %3A; first[%7e%3]%1 = %9B%1 

.APART
.GROUP;
.DEF
%3rest[x]%* is  the string  of characters  which remains  when the  first
character of  the string is deleted.  %3rest%* is also undefined for the
empty string. For example,

.EQ
%3rest[ABC]%* is %3BC%*

.APART
.GROUP;

.DEF
%3concat[x;y]%* is a function of two arguments.   %3x%* is a character; %3y%* is
a  string. %3concat%* forms  a string  consisting of the  concatenation of a
copy of %3x%* and a copy of the string  %3y%*.  For example,

.EQ
%3concat[A;BC]%* is %3ABC%* 
.APART
.GROUP;
	
There are three string predicates:

.EQ
⊗→%3char%*↔←%3[x]%1:  is %3x%* a single character?
.EQ1(24);
\⊗→%3null%*↔←%3[x]%1:  is %3x%* the empty string?
\%3x ⊗→%3=%*↔← y%1:  are %3x%* and %3y%* the same character?
.END

.APART
.GROUP;
For example:

.EQ1(24);
\%3char[A] %1is %et%1
\%3char[AB] %1is %ef%1
\%3AB = AB %1is %9B%1
.END
.pt18
.APART
.FP
The implementation  of these string primitives is somewhat  less complex
than that of LISP primitives.
%3first%* generates a character  count of one and a  pointer to the
first character of the parent string.
%3rest%* generates a character count of one less than that of the
parent and a pointer to the second character of the parent string.

%3concat%* is a  bit more  problematic.   We copy %3x%*  and copy  %3y%*
so that %3y%1 follows %3x%1 in free string space; we
generate  a character  count of  one plus the count of  %3y%*,  and
generate a pointer to the copy of %3x%*.  The copies are
made in the free string space pointed to by the %2⊗→string space pointer↔←%*.
The obvious  question of  what to do  when string space  is exhausted
will be postponed  for a moment.   The implementations  of the  three
predicates are easy: we will blur  the distinction between characters
and strings  of length one.  Thus %3char%*  need  only check the character count.
%3null%* gives %et%1 if the count is zero.  What about#=#?  Obviously characters
are  not  stored   uniquely,  so  we must  make  an  actual character
comparison.

The implementation of storage management in a string processor can be more
complex than a simple LISP implementation. We use a garbage collection
scheme which, in  some  ways,
is simpler and, in some ways, more complicated  than a collector
of list-structure.   The  marking phase  is much  simpler  since we use  the
descriptor in the symbol table to mark  the character string.  Since we
are sharing  substrings, we cannot stop the marking simply because we
have encountered a previously  marked character, but at least
the marking is not recursive.   

However, the collection
phase needs to be  more sophisticated for  string processors.   Since
strings are  stored linearly (or  sequentially), a  fragmented string
space  list  is  of  little  use.    Thus  we  must compact  all  the
referenceable strings into one end of string space, freeing  a linear
block for the new free string space.  Since we are sharing substrings,
a  little care  needs  to  be exercised.    When  we move  a  string,
obviously the descriptor of any variable referencing any part of that
parent string must be changed to reflect the new location.  So before
we begin the relocation of strings, we sort the string descriptors on
the  basis of their  pointers into  string space.   We then recognize
each parent string, moving it down into freed  locations and updating
the address  pointers of any  substrings.  We  continue this process.
Eventually all strings will  be compacted down  in string space.  The
string space  pointer will  be set  and the  computation can  continue.
Compacting garbage collectors  can be adapted for use in LISP or more
general types of data structures.

.END
.SS(A Compacting Collector for LISP,,P291:)
.P173:
.FP
We can combine the simplicity of the original mark-sweep garbage collector
with the sophistication of the collection phase of string 
garbage collector and 
produce a  compacting garbage collector for LISP.

The intention is to move all active structures to contiguous locations at 
one end of the appropriate space. 
There are several motivations for compacting storage. First, besides making the
%2active%* storage contiguous, we also make the %2free%* locations contiguous.
Thus the free lists can be handled as arrays rather than as lists. To get the next
free element, take the next element in the free array. 

The second reason for considering  compaction comes up in languages other
than LISP.⊗↓As we shall soon see, the rationale is
applicable in LISP as well.←  If the language allocates storage in a manner similar to LISP
but the constructs allow %6different%*-sized blocks to be specified
 (a string processor is a simple example), then
compaction is a strong contender. To subdue the fragmentation of memory
we could consider compacting all free locations to one end of memory.
More general schemes are also viable. We will talk about these possibilities
when we discuss dynamic allocation.

Another reason for 
concern with compacting is related to hardware. If the underlying machine is
using a paging scheme, then we can try to minimize page-faults by keeping
the LISP structures localized. In the worst case, we could have every element
of a list on a separate page; this would obviously increase machine 
overhead.⊗↓Very little empirical work has been done on the actual storage
requirements and running environment of LISP. 
A start is made in ⊗↑[Cl#76]↑; much more should be done.← 
However,
we cannot restrict the operations of the LISP programmer. The underlying hardware
%2must%* be invisible to the user. The next best thing is to try to keep
the structures as local as possible; compaction of spaces is a first attempt
at this. We will discuss other lower-level tricks later.

So granted that compaction is a worthwhile endeavor, how do we proceed?
Clearly we can't simply mark all the
active cells and then move them into unmarked cells to compact the space.
We must also maintain the original topological relationships between the elements.

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN OFF "\";TURN ON "?" FOR "\";
.TABS 45;
.KRK
.BOXA
     ⊂αααπααα⊃     ?     ⊂αααπααα⊃     
204  ~204~204~∞1 is not the same as ∞b   ?100  ~204~204~     
     %ααα∀ααα$     ?     %ααα∀ααα$                
.BOXB
.END
                                                 
Besides moving the cells, we must also update each reference to a moved location:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN OFF "\";TURN ON "?" FOR "\";
.TABS 45;
.KRK
.BOXA
         ⊂αααπααα⊃     ?     ⊂αααπααα⊃     
204  ~204~204~∞2 is∞1 the same as ∞b   ?100  ~100~100~     
         %ααα∀ααα$     ?     %ααα∀ααα$                
.BOXB
.END
.FP
To handle these problems, we expand the sweep phase into two phases:
the %2relocating%* phase and the %2updating%* phase. As with the sweep phase,
there are relocating and updating phases for %6each%* space.

The relocating phase begins after all active structure is marked.
Assume   we are to compact all active structure %6down%* to the
bottom of the space. First we initialize two pointers: a %2free%* pointer
to the lowest cell in the space; and an %2active%* pointer to the top of the space.
We move the active pointer down until we come across a marked location;
we move the free pointer up until we locate an %6unmarked%* cell. 
We want to  move that marked cell down into the free location; but
we must also supply enough information to maintain the original relationships
in the transformed structure. The cell we move  may reference other cells
which will be moved.  

.BEGIN GROUP;
Here's a picture:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.krk
.BOXA
 		     . . .
		   ⊂αααπααα⊃     
	       77  ~ 65~402~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃     
	      100  ~   ~   ~ ←∞1free pointer∞*
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃     
	      155  ~   ~   ~ 
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃     
	      204  ~402~ 77~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃ 
	      402  ~204~402~ ←∞1active pointer∞*
		   %ααα∀ααα$ 
		     . . .
.BOXB
.END
.END
.FP
Cell %b77%* was active so we left it alone; it references cell %b65%*,
which  has already been visited; and also references cell %b402%* which is about
to move. We move the contents of cell %b402%* into cell %b100%*, and to
let everyone know where the contents has gone, we leave a forwarding address of
%b100%* in location %b402%*.
.BEGIN GROUP;
Thus,
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
		   ⊂αααπααα⊃     
	      100  ~204~402~ ←∞1free pointer∞*
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃ 
	      402  ~    100~ ←∞1active pointer∞*
		   %ααα∀ααα$ 
.BOXB
.END
.END
.FP
The active pointer, having writ, moves on; 
it skips over any unmarked cells, looking for the next marked
location. Assume the next marked
location is %b204%*. It stops there and waits for the free pointer
to discover that location %b155%* is the next free location.
In its search the free pointer will  skip over any marked cells.
The same relocation operation occurs: the contents of %b204%* is moved
to location %b155%*, and the forwarding address of %b155%* is placed in location
%b204%*. The process continues until the two pointers collide.
Call that collision point %2col%*.
When they  meet, all locations above %2col%*  either will be free or will
contain forwarding addresses. All addresses, %2col%* and below, will contain
marked words or relocated cells.
We are now ready to enter the %2update%* phase.

.BEGIN GROUP;
Here is the picture:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.krk
.boxa
 		     . . .
		   ⊂αααπααα⊃     
	       77  ~ 65~402~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃     
	      100  ~204~402~ 
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃     
	      155  ~402~ 77~ 
		   %ααα∀ααα$                
                     . . .   ← ∞2col∞*
		   ⊂αααπααα⊃     
	      204  ~   ~155~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃ 
	      402  ~   ~100~ 
		   %ααα∀ααα$ 
		     . . .
.boxb
.END
.END
.FP
We examine the initial segment of our space from the bottom to %2col%*
looking for any references to that area %6above%* %2col%*. A reference
to that area must be changed. What is found in the referenced cell is not
the desired information, but is the forwarding address of the desired information.
What to do is obvious: tell the sender what the change of address is.
Thus the %3cdr%*-part of cell %b77%* becomes %b100%*; the %3car%*-part doesn't 
change. Cell %b100%* refers to two relocated cells; we find their forwarding 
addresses, and cell %b100%* becomes 
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
		   ⊂αααπααα⊃     
	      100  ~155~100~ 
		   %ααα∀ααα$                
.BOXB
.END
.FP
Similar treatment is given to cell %b155%*, modifying the %3car%*-part.
When all cells below %2col%* are updated, the garbage collection is finished.
The cells above %2col%* are all available for the free-list.

.GROUP;
.CENT(Problems)
.NL
1.##Is %2col%* in the free space list after the update phase?
.NL
2.##Write a LISP algorithm for compacting garbage collection in LISP.
.APART;
.SS(Representations of Complex Data Structures,,P138:)
.FP
In our discussion of abstract context-free data structures in {yonss(P287)},
we isolated three kinds of  structures:  

.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 ...  %eD%41%1  
e.g.,←<seq> ::= %2(%*<seq elem>%2,%* ..., <seq elem>%2)%*
.END

.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 | %eD%42%1 | %eD%43%1  
e.g.,←<seq elem> ::= <indiv> | <seq>
.END

.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 %eD%42%1 %eD%43%1... %eD%4n%1
e.g.,←<sexpr> ::= %3(%*<sexpr> . <sexpr>%3)%*
.END
.PT18;
.FP
We have discussed the behaviorial characteristics of algorithms
which operate on these structures. Now we wish to examine
the storage structure aspects of these data structures.

Corresponding to these three
data structures are three "natural" storage representations.
By "natural" we mean that even though all these structures
can be represented as LISP S-expressions, for example, there
are representations which might better suit the operations
which we expect to perform on those structures.
Since "natural" is not a well-defined term, we will 
clarify its meaning using examples of    context-free data structures.

The first type of data structure given above, maps naturally onto
a representation which contains information that the object is of
type %eD%1 and contains space for the storage instance of this data type.
Elements of type %eD%1 are homogeneous, being all of type %eD%41%1;
however, the size of a type %eD%1 element is indefinite.
Depending on the operations which are to be performed on the representation,
either a list representation or  an array representation is reasonable
for the storage structure. Unless the operations are quite
complex, a sequential allocation scheme suffices.


The second type of data structure  is frequently represented as a pointer.
There really isn't any storage allocated for objects of this type.
Instances  which satisfy this equation have their storage
requirements set by one of the %eD%4i%1 alternatives.
We will discuss pointer manipulation in LISP in the next section.

This section will discuss the third abstract data structure.
The essential characteristic here is that instances of this structure
have a fixed number of components, and the types of 
those components need not be homogeneous. 
Those components are typically referenced by name.
These characteristics form a natural distinction between
this third class and  the first class, even though an appropriate encoding
would make it possible to represent either class in the other.

.GROUP;
For example, in equations like 

.BEGIN CENTERIT;
.PT18;
←<sexpr> ::= %3(%1<sexpr> . <sexpr>%3)%1  

.ONCE INDENT 0;
or ←<form> ::= <function>[<arg-list>] 
.END
.PT18;

.FP
we reference components by  selectors like %3car%1, %3cdr%1, %3func%1,
and %3arglist%1. 
.APART;

LISP  represents instances of the above equations
as objects of the first  and second types of data structure: 
variable-length lists of pointers.
As a result, we have thought of these selectors as operations
 which might require some nontrivial amount of computation
to discover the desired component, but  as we saw in
{yonss(P290)} what is algorithm and what is data depends on your point of
view. For example, we  could  think of a dotted pair as an array which has
two components, one referenced by %3car%1, one referenced by %3cdr%1.
We say "array,"  since the number of components is known; but the
element references are done by nonnumerical names.

The natural storage requirements for such objects imply a fixed amount
of storage. That storage can be sequentially allocated since the 
size of the  element will  not vary. The representation must also
encode the scheme for associating external selector with internal 
representation.  
.GROUP
For example,

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK;
.BOXA

     	⊂αααααπααααα⊃
        ~ CAR ~     ~
	εαααααβαααααλ
        ~ CDR ~     ~
       	%ααααα∀ααααα$  		  
.BOXB
.END
.APART
.FP
Notice that the  array-referencing mechanisms have  to solve a similar
problem. However, array representation is such that the dope vector
can perform a %2calculation%1 to locate the element.

The storage element which we have been developing is called
a %2⊗→record↔←%1 (⊗↑[Pop#68]↑), or a %2structure%1 (⊗↑[Alg#75]↑,
⊗↑[EL1#74]↑), or a %2plex%1 (⊗↑[Han#69]↑).⊗↓A similar device, called a 
%2hunk%1,  has been implemented
in MacLISP ⊗↑[Ste#pc]↑.← 

.BEGIN TURN OFF "←";
Besides the usual constructors, selectors and recognizers, records
are typically supplied with a function 
to modify components of a structure. This function is called  an 
%2updater%1.
Just as we can write %3A[43]#←#56%1 where %3A%1 is an array, an
updater function would implement a statement like %3car[x]#←#(A#.#B)%1.
.END
Updating of simple variables is called assignment.
A discussion of "updating"  of more general
data structures requires a deeper understanding of
the implementation and storage  structures. In the  case of
LISP, it requires a discussion of 
the modification of pointers. That is  the topic of the next
section.
.SS(%3rplaca%* and %3rplacd%*,,P171:)
.BEGIN TURN ON "←";
.FP
The discussion in {yonsec(P188)} developed an implementation of the
LISP operations in terms of  the manipulation of pointers.
Those manipulations allowed the 
creation of new structure or allowed
sharing of an existing structure. None of these operations involved the
modification of an existing structure.
In this section we will discuss 
some LISP  coding
tricks which do involve modification operations.
.BEGIN GROUP;
First, consider 

%3←append <= λ[[x;y][null[x] → y; %et%3 → concat[first[x];append[rest[x];y]]]]%1

.END
.FP
This function copies %3x%* onto the front of %3y%*. Note: %3y%* is not copied.
Or recall the %3subst%* function: it generates  a copy with the correct
substitutions made. LISP operation must make copies  in general, otherwise 
some very 
unsavory side effects can  happen.

Consider the expression %3append[(A B C);(D E)]%*.  It appears that we
could get the  effect of %3append%* by %3rest%*-ing down the list
%3(A B C)%* until we found the terminator, and then we could replace 
that terminator with a pointer to the list %3(D E)%*.  Thus,


.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
⊂αααπααα⊃  ⊂αααπααα⊃  ⊂αααπαα⊃     ⊂αααπααα⊃  ⊂αααπαα⊃
~ A ~ #αβα→~ B ~ #αβα→~ C ~≤'. . .→~ D ~ #αβα→~ E ~≤'~
%ααα∀ααα$  %ααα∀ααα$  %ααα∀αα$     %ααα∀ααα$  %ααα∀αα$
.BOXB
.END
.FP
The resulting structure does indeed look like one  we would have
obtained from %3append%1. The  operation we performed
%6modified%1 the existing structure, instead of  %6copying%1 it as %3append%1
would have done. Such modifications can cause serious difficulties.
.GROUP;
Let us call the modifying version of %3append%1, %3nconc%1; and consider the
execution of the following sequence of statements:

.BEGIN TABIT1(30);TURN OFF"←";SELECT 3;

%1first%3\i ← (A B C)

%1then%3 \j ← (D E)

%1and finally,%3\k ← nconc[i;j]
.pt18
.END
.APART
.FP
After execution of the third statement, %3k%1 would have value %3(A#B#C#D#E)%1.
The difficulty is that %3i%1 would  %6also%1 have this value since we
modified the structure assigned to %3i%1.
Also, any value
which was sharing  part of the structure  of %3i%* will also  be changed.
Language features which   surreptitiously change values  are evil.   Sometimes,
however,   it is quite useful to be  evil.  
The modification procedures, such as %3nconc%1, can be defined
in  terms of  two  obscene
functions: %3rplaca%* and %3rplacd%1.


.BEGIN GROUP;
The  procedure  ⊗→%3rplaca%*↔←%3[x;y]%*  replaces the %3car%*-part of %3x%* with %3y%*.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA


   ∞3dest∞*	     		       ∞3env∞*
    ~				   ~
    ~	⊂ααααααα⊃		   ~	⊂ααααααα⊃
    %→  ~     #αβα⊃ 		   %→   ~     #αβαα###→
	εαααπαααλ ↓			εαααπαααλ 
          # # #	 ←$			~ x ~ # βα→α⊃
	~   ~   β→ - - →⊃		εαααβαααλ   ~
	εαααβαααλ  	  		~ y ~ #αβ→⊃ ↓
        ~   ~   ~	↓               %ααα∀ααα$ ↓ ~
          # # #	      ⊂αααπααα⊃	        ⊂ααααααα⊃ ~ ↓
       	εαααβαααλ     ~ # ~ ? ~  ⊂ - - →~   ?   ~←$ ~
        ~   ~   ~   ⊂→%α|α∀ααα$	 |      %ααααααα$   ~
       	%ααα∀ααα$   ↑	% - → - →$                  ↓
		    %←ααα←ααααα←αααα←ααααα←ααααα←ααα$

.BOXB
.END
.LE(6,Algorithm for %3rplaca%1)
.END
.FP
The second function is
⊗→%3rplacd%*↔←%3[x;y]%* meaning replace the %3cdr%*-part of %3x%* with %3y%*.
The  AMBIT/G  description of this function was given on {yon(P246)}.


.BEGIN GROUP
Now %3nconc%* can be defined as⊗↓Since we're really involved with the representation
we use %3car%* and %3cdr%* rather than %3first%* and %3rest%*.← 
.BEGIN SELECT 3;TABIT2(16,19);TURN OFF"←";
.KRK
.BOXA

nconc <= λ[[x;y] prog[[z]
\\[null[x] → return[y]];
\\z ← x;
\a\[null[cdr[z]] → rplacd[z;y]; return [x]];
\\z ←cdr [z];
\\go[a] ]]

.BOXB
.END
.END

.BEGIN SELECT 3;TABIT2(23,30);TURN OFF"←";GROUP;
.KRK
.BOXA
%1Consider:%*\prog[[x]\x ← (NOTHING CAN GO WRONG);
\\rplacd[cdddr[x];cddr[x]];
\\print[x]]
.BOXB

.END
.FP
We can use %3rplacd%* to generate circular lists.
Circular lists cannot be generated in LISP without functions like
%3rplaca%* and %3rplacd%*.
See the problem on {yon(P55)}.
In  general,  to circularize  a nonempty
list, %3x%*, %3rplacd[last[x];x]%* suffices where

←%3last <=λ[[x][null[cdr[x]] → x; %et%3 → last[cdr[x]]]]%1

Functions which modify list structure must be  used with  extreme caution.   
They are  not
recommended for amateur hacking.  They are introduced here simply to
show that  it  is  possible to  improve  on the  efficiency  of  pure
algorithms by resorting to these coding tricks.

.BEGIN GROUP
←%2Problems%*
.NL
1.##What is the effect of evaluating %3rplacd[x;cdr[x]]%*?
.NL
2.##Recall the problem on ⊗→hash consing↔← on {yon(P154)}.
There we were contemplating unique storage for all S-exprs.
Can such a scheme be reconciled (efficiently) with functions like
%3rplaca%* and %3rplacd%*?
.NL
3.##It has been pointed out that %3rplaca%1 and %3rplacd%1 are
closely related to assignment statements ⊗↑[And#76]↑. 
Extend one of our evaluators to recognize expressions like 
.BEGIN CENTER;TURN OFF "←";
%3car[%1<form>%3] ← %1<form>
.END
.ONCE INDENT 4;
as abbreviations for:
.BEGIN CENTER;SELECT 3;
rplaca[%1<form>; <form>%3]%1
.END
.ONCE INDENT 4,4;
This extension of assignment is obviously not restricted to
%3rplaca%1 but could allow arbitrary forms on the left-hand
side of an assignment.
.END
.END


.SS(Applications of %3rplaca%* and %3rplacd%*,,P155:)
.BEGIN TURN ON "←";
We begin with rather simple examples. Consider the  problem of inserting
an element into the middle of a list.  For example let %3x%* be the list 
%3(A B C)%*.  If we wished to insert an atom, say %3D%*, between %3B%*
and %3C%*, we could perform 

.pt18
.BEGIN CENTER; SELECT 3;turn off"←";
x ← cons[car[x];cons[cadr[x];cons[D;cddr[x]]]].
.END
.PT18
.FP
In general, we have little choice but to recopy  the initial segment
of %3x%*, adding %3D%* into the appropriate place.  A similar technique is
obviously available to delete a specified element from the interior of a list.

Careful use of %3rplacd%* can, in some instances, replace the heavy use of %3cons%*.
%3cons%* always carries with it the threat of a call on the garbage collector.

For example, given the list %3(A B C)%* with pointers  %3x%* and %3y%*  into it
as follows:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 12,12
.BOXA
x	       y
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ A ~ #αβαα∀αα→~ B ~ #αβαα→~ C ~≤'~
    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$

.BOXB
.END


we could insert the element  %3D%*  %6after%* the first element in %3y%* by 

←%3rplacd[y;cons[D;cdr[y]]]%*, giving⊗↓Notice 
that %6one%* application of %3cons%* is unavoidable.←:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 8,8;
.BOXA
x	       y
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃        ⊂αααπαα⊃
%αα→~ A ~ #αβαα∀αα→~ B ~ #αβα⊃    ⊂→~ C ~≤'~
    %ααα∀ααα$      %ααα∀ααα$ ↓ 	  ↑ %ααα∀αα$
			     ↓    ~
		        ⊂αααπααα⊃ ↑
			~ D ~ #αβα$
			%ααα∀ααα$

.BOXB
.END
.FP
But as always, be warned that the value of %3x%* has also been changed; and
any S-expr sharing the list %3x%* or %3y%* as a sublist has also been affected.

We can also use %3rplacd%* to delete not the %2first%*, but the next element 
in %3y%* by

.EQ
%3rplacd[y;cddr[y]]%*
.PT18;
.FP
Similarly, we can use %3rplaca%* to modify an element in a list (or S-expr).
To change the first element in the list, %3y%*, to the S-expr  %3z%* use

.EQ
%3rplaca[y;z]%*
.PT18;
.FP

Notice that the uses of %3rplacd%* for insertion and deletion are couched in terms
of insert %6after%* and delete %6after%*, rather than insert %6at%* or delete %6at%*.
If you look at a diagram you will see why.

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 12,12;
.BOXA
x	       
~	       
↓   ⊂αααπααα⊃      ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ A ~ #αβααααα→~ B ~ #αβαα→~ C ~≤'~
    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$

.BOXB
.END

To delete the element %3B%* requires modifying the %3cdr%*-part of the predecessor
cell; a similar remark applies  to insertion at a specified cell. 
A simple, perhaps inefficient scheme,  to support such  modification
would be to start a second pointer
 from the beginning of the list, looking for the cell whose %3cdr%* pointed to the
desired spot; then make the modification.  

If these "modification-%6at%*" functions were to be performed very frequently,
then it might be worth starting %6two%* pointers down the list, one at %3x%*,
one, say %3y%*, at %3cdr[x]%*, as above.  Then testing could be done using %3y%*
and the modification   could be done using %3x%*. When we move %3y%* to
%3cdr[y]%*, we move %3x%* to %3cdr[x]%*.  If we wanted to modify
%6before%* rather than %6at%*, we could proliferate the "back pointers,"  but
 if this kind of generality is required a change of representation
is called for.  We might resort to the double-linking scheme introduced on 
{yon(P164)}; more complex representations are also discussed
in detail in ⊗↑[Knu#68]↑  Chapter 2.

A LISP 
implementation which stores p-lists as list structure would 
use %3rplaca%* and %3rplacd%* heavily; for example, functions
which modify properties on the p-lists would use these functions. Here are the
two p-list manipulating functions, ⊗→%3putprop%*↔← and ⊗→%3remprop%*↔←.

.GROUP;
.DEF
%3putprop%* was introduced on {yon(P52)}.
Recall that the effect of %3putprop%* is to attach an indicator-value pair
to an atom. If the indicator is already present, then we will simply change its 
value;  
if the indicator is not present, then we will add the indicator-value 
pair to the front of the p-list. 
In the definition %3n%* is an atom, %3i%* is an indicator, and %3v%* is 
the value to be stored.

.BEGIN SELECT 3;TABIT2(17,20);GROUP; TURN OFF"←";
.KRK
.BOXA
putprop <= λ[[n;v;i] prog[[m]
\\m ← cdr[n];
\a\[eq[car[m];i] → rplaca[cdr[m];v];return[v]];
\\m ← cddr[m];
\\[null[m] → rplacd[n;cons[i;cons[v;cdr[n]]]];return[v]];
\\go[a] ]]
.BOXB
.END
.FP
Note that extended conditional expressions are used in the definition.

.APART
.GROUP;
.DEF
%3remprop%* was also introduced on {yon(P52)}.
%3remprop%* is a predicate  used to remove attribute-value pairs from
the property list of an atom.
We will capitalize on the LISP "%3NIL%1-non#%3NIL%1" trick
for predicates and return the removed property value if one is found.
The following implementation of %3remprop%1 does that.

.BEGIN SELECT 3;TABIT2(15,18);GROUP;TURN OFF"←";
.KRK
.BOXA

remprop <= λ[[n;i] prog[[m]
\\m ← n;
\a\[eq[cadr[m];i] → rplacd[m;cdddr[m]];return[caddr[m]]];
\\m ← cddr[m];
\\[null[cdr[m]] → return[%ef%3]]
\\go[a]]]
.BOXB
.END
.APART
.FP
Applications of %3rplacd%* will occur inside ⊗→%3ratom%*↔← and
on {yon(P160)}
we will develop a version of LISP's parser which uses pointer modification to
gain efficiency.
Pointer modification is also used
inside the ⊗→garbage collector↔←.  Recall the %3sweep%1 phase of the collector
on {yon(P257)}.


.P161:
Finally, one of LISP's less illustrious uses of pointer modification
is to "allow" the construction of self-modifying programs.
This technique is similar  to the machine language tricks of self-modifying code
and should be used with similar frequency.
The freedom to hang yourself should not be construed as an invitation
to do so, but
it again points out the similarities of LISP to machine language and
highlights the differences between LISP and its contemporary high-level
languages.

LISP's  central processor %3eval%* operates by traversing and interpreting
the data structure representation of the program; that data structure is also
open for inspection by LISP's data structure manipulating functions.
Since we now have list-modifying functions, we could conceivably modify a program
by changing its internal structure. Indeed we can write a program which
modifies its %6own%* structure. 

.BEGIN TABIT2(14,16);GROUP;TURN OFF "←";
Here's one:
.SELECT 3;
.KRK;BOXA
foo <= λ[[x] prog[[y;z]
\\z←1;
\\y←sixth[body[foo]];
\a\print[x];
\\rplaca[rest[y];z←add1[z]];
\\go[a] ]]
.BOXB
.END
.FP
The mystery created by %3y%* is a pointer into the representation of the
statement %3print[x]%*; that representation is %3(PRINT#X)%*. Therefore
the effect of the first %3rplaca%* is to change %3(PRINT#X)%* to %3(PRINT#2)%*.
Subsequent passes through the loop will change the statement to print 
%33, 4,%* and so on. 
There really isn't much that can be said about such a program.

.CENT(Problems)
.NL
1.##More on %3ratom%*.
Recall the discussion of %3ratom%* in {yonss(P13)} and {yonss(P14)}.
Now that you know about %3rplaca%* and %3rplacd%* write a more detailed
version of %3ratom%*.
.NL
2.##Recall the function which reverses the top-level elements of a list.
On {yon(P16)} and {yon(P23)}
we wrote it in various styles and with varying degrees of efficiency.
All these functions used %3cons%*; however, it is clear that we should be
able to reverse a list without using %2any%* new cells.
Express this
algorithm as a LISP function. If you use %3prog%*, don't use any %3prog%*-variables.

.END
.GROUP;
.SS(Numbers,numbers)
.FP
In most implementations of LISP, numbers are stored  as very simple kinds of
atoms:  they do not  need print names,  but since  we should probably
allow fixed- and floating-point representation, we do  need indicators
for these properties.  Thus 

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BOXA
.INDENT 8,8;

fixed-point 1
~
↓
~  ⊂ααααααααπααα⊃   ⊂ααααα⊃ 
%→ ~ FIXNUM ~ #αβαα→~  1  ~
   %αααααααα∀ααα$   %ααααα$

floating-point 1
~
↓
~  ⊂ααααααααπααα⊃   ⊂ααααααααααααααααααααααα⊃ 
%→ ~ FLONUM ~ #αβαα→~ <machine rep  of 1.0> ~
   %αααααααα∀ααα$   %ααααααααααααααααααααααα$
.BOXB

.END
.APART;
.FP
In this representation, the number is stored in FWS and the 
type indicator is indicated by a minimal property list.
This representation
is a  bit expensive in space and can be expensive to manipulate with arithmetic
operators. Several tricks have been used to improve arithmetic in LISP.

Assume that the  addressing
space of the machine  is 2%818%* and that the usual  size of a LISP
core image is significantly smaller, say, N.  Then all memory address
references greater than N  are illegal (and trapped by  the monitor).
What we  will do is use  these illegal addresses to encode  some of the
smaller positive and  negative integers, mapping  zero on the  middle
address, the  positive numbers to  lower addresses and  the negatives
onto  the  higher   addresses.    Thus  these  smaller  integers, called
%2INUMS%*,  are
represented by pointers outside of the normal LISP  addressing space.
This  trick can  considerably decrease  the storage  requirements for
jobs  heavily using small  numbers.  (Numbers are  not usually stored
uniquely.)

.GROUP;
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TABS 28;
.BOXA

	⊂αααααααααα⊃ 2∞818∞*
	~?~
	~?~
	~?~ m ∞1<∞* 0
	~?~
	~?~ m = 0
	~?~
	~?~ m ∞1>∞* 0
	~?~
	~?~
	εααααααααααλ N
	~?~
	~?~
	~?~
	~?~
	~?~
	~?~
	~?~
	~?~
	%αααααααααα$ 0

.BOXB

.END
.LE(6,Picture of INUM Space)
.APART
.FP
The INUM representation takes less space but adds an additional complexity:
now the arithmetic operators have to deal with three different kinds of numbers.

The MacLISP (⊗↑[Moo#74]↑) implementation uses a different representation for 
numbers.⊗↓Much care went into the MacLISP  number
handling since it is used as the implementation language for MACSYMA 
(⊗↑[MAC#74]↑,#⊗↑[Wan#75]↑,#⊗↑[Mos#74]↑),
a large algebraic and symbolic mathematics system. MacLISP's efficient
number facilities, coupled with its  optimizing compiler,
have resulted in the production of compiled code which is more efficient
than that produced by DEC's FORTRAN compiler (⊗↑[Fat#73]↑).← 
In that implementation, two spaces are allocated for number storage:
FIXNUM space and FLONUM space. This  makes a more compact representation since
the type information is implied in the address of the object  rather than 
being explicitly
stored. To those basic spaces we add two temporary stack areas: FIXPDL and
FLOPDL. These areas are used for temporary arithmetic computation.

The temporary areas work in conjunction with a  type declaration option
available in MacLISP. 
If we know that certain variables are %6always%1
going to be used as numbers in a particular  function, then  we can compile
better code. Assume %3x%1 and %3y%1 are to be used %6only%1 as FIXNUMs
within a function %3f%1; we would make such declarations for the LISP compiler
just as we can declare some variables as "special" to LISP compilers.
When we allocate space for %3x%1 and %3y%1, we allocate space on the
top of FIXPDL. Within %3f%1 the arithmetic operations use the hardware
arithmetic and reference the stack elements. The stack elements can be
passed to other arithmetic functions called within %3f%1, and no permanent
storage need be allocated in FIXNUM space until later.
The efficiency  of arithmetic operations is dependent on the existence
of special hardware instructions for such arithmetic.
However, special hardware also places limits on the arithmetic
capabilities of most languages:  arithmetic
is limited by the word size. LISP is able to overcome such limitations.

Most  numerically oriented  programs  are  faced  at some  time  with
overflow.  That is, they attempt  to construct a number  which is too
large to  be represented  in one  machine location.    There are  now
several   versions   of   LISP  which   will   automatically   change
representation  when faced  with overflow.   
This scheme is called %2arbitrary precision arithmetic%1 and
has been implemented for both fixed-point and floating-point
numbers. We will describe a representation for fixed-point numbers
called %2BIGNUMS%1; they could have the following structure:


.GROUP;
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BOXA
.INDENT 10,10;

positive big number
~
↓
~  ⊂ααααααααπααα⊃   ⊂αααπααα⊃        ⊂αααπαα⊃
%→ ~ POSNUM ~ #αβαα→~ # ~ #αβα→ ... →~ # ~≤'~
   %αααααααα∀ααα$   %αβα∀ααα$        %αβα∀αα$
		      ↓                ↓
		     N∞40∞*                N∞4n∞*

negative big number
~
↓
~  ⊂ααααααααπααα⊃   ⊂αααπααα⊃        ⊂αααπαα⊃
%→ ~ NEGNUM ~ #αβαα→~ # ~ #αβα→ ... →~ # ~≤'~
   %αααααααα∀ααα$   %αβα∀ααα$        %αβα∀αα$
    	              ↓		       ↓
		     N∞40∞*                N∞4n∞*


.BOXB
.END
.LE(8,Structure of a BIGNUM);
.APART

.BEGIN GROUP;
.FP
The value of a BIGNUM is given by:
.EQ
%7b%1(N%40%1 + %7a%1N%41%*+ ... + %7a%8n%1N%4n%1)
.PT18;
.FP
where %7b%1 is either + or  - and 
%7a%1-1 is  the largest number  representable in one  machine word.
The translations  BIGNUMS and the other numeric representations
are done automatically within  the evaluator.

On most implementations of LISP, no attempt is made to store numbers uniquely.
Thus %3eq%1 will not work on numbers  other than INUMs; either 
%3equal%1 is extended for numbers
or a special equality predicate for numbers is  provided.
.END
.SS(Stacks and Threading)
.FP
Though  recursive descriptions of algorithms are usually more illuminating than
their typical machine-oriented counterparts, it is frequently more efficient
to encode those algorithms in manners which can take advantage of contemporary
hardware. This section will discuss two techniques which "unwind" the recursion
and typically lead to faster execution.

Recall the marking phase of a garbage collector in {yonss(P146)}. There
we wrote %3mark%* as a recursive algorithm. We could equally well write %3mark%*
using  an explicit stack:
.BEGIN TABIT3(6,12,27);SELECT 3;GROUP;TURN OFF "←";
.boxa
.krk

mark <= λ[[tr]prog[[st]
\loop\[is_marked[tr] → go[chk_st];
\\ is_full_wd[tr]→ markA[tr];go[chk_st];
\\ is_free_wd[tr]→\st←push[cdr[tr];st];
\\\markA[tr]; 
\\\tr←car[tr];go[loop]];
\\ %Et%* → go[chk_st]]; ⊗↓%1This  branch of the conditional could be omitted
and the  effect would be the same.←%3
\chk_st\[null[st] → return[%et%*]];
\\tr←top[st];
\\st←pop[st];
\\go[loop] ]]
.boxb
.APART;
.group;

push <= λ[[i;st] concat[i;st]]
top <= λ[[st] first[st]]
pop <= λ[[st] rest[st]]
.boxb
.END
.FP
Notice that we  save only the %3cdr%*-node in the stack; even at that the
stack grows proportionally to the depth of the tree being traversed. 
See the problem on {yon(P162)}.
The technique of using an explicit stack sometimes is more intuitive and 
sometimes will lead to faster execution.

The second technique is more  tricky but will lead to significant pay-offs
in execution time.⊗↓But there will be a proportional loss in clarity 
in the  code←
The technique is called %2⊗→threading↔←%* .
The basis for threading is a desire to traverse tree structures in a more
efficient fashion than that typically available in recursion or via
stacks. Recall that on {yon(P164)} we surmised 
that %2⊗→double-linking↔←%* might be
advantageous in moving up and down the "spine" of a tree structure.
Double links would allow us to find the successors and predecessors of nodes
easily. However the extra link gives us no help if we wish
to descend into the substructure. It is this area to which threading addresses 
itself: descent into tree structure, and in fact, nonrecursive tree traversal.
Threading  comes in various flavors; we will now discuss a few.

Look at the new %3mark%*-er above;  notice that for a
%6fixed%* tree and a %6fixed%* order of traversal,
that any two applications of marking will
have the same pattern of behavior. The order of visitation to each node
will obviously be the same, but the dynamic changes in the state  of the stack
will %6also%* be the same. Instead of replicating the portion of the stack,
it might be possible to store the stack information in the structure itself.
This technique of hiding the control structure in the data structure is called
threading. 
The general application of threading is of questionable utility. It typically
requires a more complex data structure: we must be able to store  both
threads and links. The traversing algorithms also become more complex:
we obviously must be able to recognize the difference between 
control threads and data links. Care must also be taken if we wish to share
threaded list structure. See ⊗↑[Knu#68]↑ for a complete discussion
of the techniques and tricks.

We are not about to complicate the LISP structures, but dispensing with a 
stack, be it implicit or explict,   does   have some merits. What we   can  
do in LISP is  strike a compromise. Instead of storing the threads permanently
in the structure, there are significant applications of threading
where we can %6temporarily%* store threads as we traverse trees. 
The first application is in the design of a nonrecursive %3read%* program.
The second application we will describe is in the mark phase of a garbage
collector.
.CENT(Problem)
.P162:
.NL
1.##With a little
more testing before stacking we can significantly cut down  the
number of %3push%*es we have to do. Namely, if some of the branches point
immediately to atoms we might as well mark them at that time and 
proceed without doing a stack operation. Only when both branches are "nonatomic"
do we need stack the %3cdr%*. Write such an algorithm.
.SS(A Non-recursive %3read%*)
.P160:
.FP
The original %3read%* algorithm of {yonss(P13)} is a good example of
a clear recursive algorithm;
it is reasonably straightforward to follow the flow of the algorithm.
However, now that we understand what the run-time behavior of such a recursive
program is, we can see that %3read%* and friends are a drain on %6two%*
resources: they 
use free-space to %3cons%*-up the S-expr representation of the input;
they use  the run-time stack for  handling  the implementation of the
recursion and for saving parameters and intermediate computations. A deeply
nested expression will use a lot of the run-time stack.
Clearly, there is nothing we can do about the drain on the free lists,⊗↓We 
 probably will be drawing on the full word area for print name
storage as well as on the free space area for
list structure storage.←  but we %6can%* dispense with the run-time 
stack by threading.
We can in fact do so without a proportional increase in the use of cells in
free space; indeed we need only %6one%* additional free cell, regardless
of the complexity of the input! This is truly a win for efficiency; it will be
a big loss for clarity. But again that's why this section on storage and efficiency
is where it is; we have an understanding of the purpose and intent of %3read%*;
only after the basic algorithm is well understood should we attempt to be
clever and efficient.
First we describe the basic ideas of the algorithm, then we give the algorithm.

The main idea  in the algorithm is the realization that we really can determine
the storage requirements for a complex S-expr or list structure as we read it in.
For example consider 
the input string "%3(A#(B#C)#D)%*". As we start our left-to-right scan of the 
input,
we see "%3(%*". This immediately tells us that we need at least one %3cons%*.
We read "%3A%*"; that tells us what the %3car%* of the expression is. Notice that
we don't yet know whether the expression is "dotted" or "listed," but the
storage requirements will be the same. On reading the next open parenthesis we
know we need to add a new level in the  developing representation. The
"%3B%*" and "%3C%*" add elements to that level, and the closing parenthesis finishes
it off. The closing parenthesis also should signal our parser to return to the
prior level and continue scanning the input. The "%3D%*" goes on that
level and the final closing parenthesis completes the input.
All this is old but the difference now is that we don't use recursion or an explicit
stack to keep track of where we are. We keep a thread in the %3cdr%*-part of 
the last cell on every level. When we go down a level we manufacture a new
cell with the %3cdr%* pointing to the cell we just came from in the previous
level; this happens when we see a left parenthesis. We go up a level when we see
a right parenthesis; that is done by following up the thread in the current level,
after doing appropriate cleanup. 

There are three basic states in the reader: 
.group;
.NL
%21.%*##The next input should go into the %3car%*-part of the current cell.
This state is entered when we go down a level. It is labeled %3head%* in the
following program.
.NL
%22.%*##The next input should go on the current level. This is the typical state
in the building of a list-input. Here we add a new cell in the current level 
and put the input in the %3car%*-part of that cell; then stay in this state.
This state corresponds to label %3tail%*.
.NL
%23.%*##The other main state occurs on reading a dot when in %3tail%*#state.⊗↓Dots 
seen in  any other context are errors.←  In dot#state we check the next input;
if it's an atom we stuff it on the thread and go up. If it's a  left parenthesis
we add  a new cell and go down.
.PT24
.APART
There are some anomalies in the algorithm due to the desire 
to recognize both S-expr notation
and list notation. The main malefactor is a count of parenthesis; it increments for
left parentheses and decrements for right parentheses. A legal input has been
recognized when we are back at the top level and the count is zero.

The final difference between the old parser and the new one involves
the scanner  %3ratom%*. We assume a new %3ratom%* which reads %3()%* and
returns %3NIL%*. This is not too strenuous an assumption. If the scanner
sees an open parenthesis, it looks ahead to the next meaningful 
character.⊗↓It ignores 
spaces and the like.←  If the character is a closing parenthesis, 
the scanner  takes it;
if the character is not, it is left for the next call on %3ratom%* and
%3ratom%* returns with an indication that it has seen a left parenthesis.
With this introduction, here is the new %3read%*:

.BEGIN SELECT 3; TABIT3(6,16,27);TURN OFF "←";GROUP;
.KRK
.BOXA

read <= λ[[]prog[[j;cp;count;top;temp]
\count←init[]; cp←count; top←cp;
head\j←ratom[];
\[or[is_dot[j];is_rpar[j]] → err[];

\ is_lpar[j] →\incr[count];
\\cp←down[cp];
\\go[head];
\ atom[j] → stuff[cp;j]; go[ckend]];
tail\j←ratom[];
\[atom[j] → cp←insert_move[cp;j]; go[ckend];

\ is_rpar[j] →\decr[count];
\\[eq[top;cp] → go[ck1];
\\ %et%* →  cp←stuff_up[cp;NIL]; go[ckend]];

\is_lpar[j] →\incr[count];
\\cp←down[insert_move[cp;NIL]];
\\go[head];

\is_dot[j] →\j←ratom[];
\\[or[is_dot[j];is_rpar[j]] → err[];
\\ is_lpar[j] →\incr[count];
\\\cp←insert_move[cp;NIL];
\\\go[head];

\\ atom[j] →\cp←stuff_up[cp;j];
\\\go[ckend]]]; ⊗↓%1This %3go%1 is superfluous, but makes the flow more apparent.←%3
ckend\[eq[cp;top] → go[ck1];
\ %et%* → go[tail]];
ck1\temp← cnt[top];
end2\[zerop[temp] → return[exp[top]];
\j←ratom[];
\[is_rpar[j] → temp←sub1[temp]; go[end2];
\ %et%* → err[] ]]]
.BOXB

.END
.BEGIN SELECT 3;GROUP;TURN OFF "←";TABIT1(22);
.KRK
.BOXA
init <= λ[[] cons[NIL;0]]
stuff <= λ[[x;y] rplaca[x;y]]
incr <= λ[[z] rplacd[z;add1[cdr[z]]]]

insert_move <= λ[[cp;val] rplacd[cp;cons[val;cdr[cp]]]; cdr[cp]]

down <= λ[[cp] rplaca[cp;cons[NIL;cp]];car[cp]]

stuff_up <= λ[[cp;j] prog[[temp]
\temp ← cdr[cp];
\rplacd[cp;j];
\return[temp]]]

cnt <= λ[[x] cdr[x]]    
exp <= λ[[x] car[x]]
.BOXB
.END
.FP
The development and understanding of this algorithm required most of what we
have covered in the course. 
We used our knowledge of the parser, %3read%*; we used our familiarity with
 S-exprs stored as linked lists; we have to understand the run-time control
of recursive calling sequences; we had to understand pointer manipulation;
we have to understand pointer modification; and finally we have to be wickedly
clever.
With that understanding we
were  able to apply threading at a level higher than a "once#only" trick.
.CENT(Problem)
.NL
%21.%*##Write a version of %3read%* which uses an explicit stack to remember where it
is in the parse.


.SS(More Applications of Threading)
.Cent(A Link-Bending Garbage Collector for LISP)
.P144:
.FP
The use of a stack is one of the difficulties associated with garbage 
collection. Garbage collection is invoked when available space has become
exhausted, but here we are  asking for %6more%* space to use for stacking.
The usual solution 
to such a problem is to allocate a separate area for stack storage. This 
has its drawbacks. If we don't allocate enough stack space,- i.e., if the
depth of a piece of structure becomes too great, then the marker will fail.
The amount of stack space can become large - proportional to the
depth of a list.
We can apply threading here,
modifying the structure as we traverse it; as usual the threads will be used as
control information. As we finish marking a branch we
restore the structure to its original topology.
Several versions of such threaded collectors are available; see
⊗↑[Chr#68]↑ for a version
written in AMBIT/G; a more traditional description is found in
 ⊗↑[Sch#67]↑;⊗↓The correctness of [Sch#67] has been proved by  de#Roever.← 
 and see ⊗↑[Knu#68]↑ for several alternatives.

.Cent(Binding Implementations)
.FP
Threading can be used in the shallow binder described in {yonss(P228)} to  remember
the path through the environment tree#(⊗↑[Urm#76]↑). 
We thread from E%4bind%1 to E%4inter%1
when we are looking for E%4inter%1. 
This consists of reversing the access links as we proceed toward  E%4inter%1.
Then, as we swap back the value cells, we will unthread from
E%4inter%1 to E%4bind%1.
.SS(Storage Management and LISP,,P224:)
.FP
There are two basic areas of LISP which require attention:
the implementation of data stuctures, and the implementation of the
LISP machine. We will discuss applications in that order.

LISP's most general data object is a dotted pair; however, we frequently
are found to be using dotted pairs to represent more structured objects.
For example, many common LISP programs involve list#operations on
list representations. But lists, we know, are representations of sequences.
From {yonss(P137)}
we now also know that arrays are efficient representations of sequences.
Indeed array representations are typically more efficient than the
general LISP linked-list.  We would like to capitalize on this more 
efficient representation without jeopardizing the LISP operations.

An analysis of the LISP operations shows that we have to be able to %6share%*
substructures, and if using %3rplaca%* or %3rplacd%*, we have to be able to
%6modify%* existing structures. Any proposed economies in storage layout
must take cognizance of these facts. Fortunately these requirements are compatible.
Consider the typical  representation of the sequence:
.BEGIN CENTER;SELECT 3;

(LAMBDA (X) (F X Y))
.END

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 6,6
.BOXA
⊂ααααααααπααα⊃   ⊂αααπααα⊃  ⊂αααπαα⊃  
~ LAMBDA ~ #αβαα→~ # ~ #αβα→~ # ~≤'~
%αααααααα∀ααα$   %αβα∀ααα$  %αβα∀αα$ 
		   ~	      ↓
	⊂αααααααααα$     ⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
	↓		 ~ F ~ #αβ→~ X ~ #αβ→~ Y ~≤'~
	⊂αααπαα⊃	 %ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
	~ X ~≤'~  
	%ααα∀αα$

.BOXB
.END
.FP
This takes seven words. The %3car%*-part  of each cell contains the information;
the %3cdr%*-part tells where the rest of the expression is to be found.
That is, we have dedicated 14 half-words to represent the structure; only seven
of which contain the actual information we wish to store. 
Using some extra encoding we can carry the same information in 
seven cells. ⊗↓These cells would need to be slightly larger.←

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.boxa
.indent 15,15
⊂ααααααααα⊃
~↓ LAMBDA ~
εαααααααααλ     ⊂αααααααα⊃
~↓      #αβαααα→~/     X ~
εαααααααααλ     %αααααααα$
~/      #αβα⊃
%ααααααααα$ ↓   ⊂αααααααα⊃
	    %αα→~↓     F ~
		εααααααααλ
		~↓     X ~
		εααααααααλ
		~/     Y ~
		%αααααααα$

.boxb
.END
.FP
The intent of the special characters  is to encode information about
the %2next%* cell in the representation. It thus is called %2⊗→%3cdr%*-coding↔←%1.
The %b↓%* means the next cell %2is%* the %3cdr%*; %b/%* means the %3cdr%* is 
%3NIL%*.

The typical LISP cell is a third variety of %3cdr%*-coding: the code %b→%* says
the next cell is a pointer to the %3cdr%*. With that, we introduce the final code:
%b*%* means this cell is the %3cdr%*-half of a LISP word.
Thus %3(A#B)%* could be expressed in any of the following forms:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 6,6
.BOXA
⊂ααααα⊃			⊂ααααα⊃
~↓  A ~			~→  A ~
εαααααλ			εαααααλ    ⊂ααααα⊃
~/  B ~			~*  #αβααα→~/  B ~
%ααααα$			%ααααα$    %ααααα$


⊂ααααα⊃			⊂αααααπααααα⊃    ⊂αααααπααααα⊃
~→  A ~			~  A  ~   #αβααα→~  B  ~  ≤' ~
εαααααλ   ⊂ααααα⊃	%ααααα∀ααααα$    %ααααα∀ααααα$
~*  #αβαα→~→  B ~
%ααααα$   εαααααλ
	  ~* NIL~
	  %ααααα$

.BOXB
.END

However this encoding scheme is not sufficient as it stands. Consider the 
following example: Given internal pointers %3x%* and %3z%* into
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 13,13;
.BOXA

x	       z
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ F ~ #αβαα∀αα→~ X ~ #αβαα→~ Y ~≤'~
    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$

.BOXB
.END
.FP
and assuming  we wish to perform %3rplacd[x;(A#B#C)]%* 
in our standard implementation we would arrive at
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.indent 8,8
.BOXA

x	       z
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ F ~ # ~  %αα→~ X ~ #αβαα→~ Y ~≤'~
    %ααα∀αβα$      %ααα∀ααα$   %ααα∀αα$
          ↓
          ~ ⊂αααπααα⊃      ⊂αααπααα⊃   ⊂αααπαα⊃
          %→~ A ~ #αβααααα→~ B ~ #αβαα→~ C ~≤'~
	    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$
.BOXB
.END
.FP
But if we assume %3(F X Y)%* is represented in its compact form,
we have some troubles.
We can't replace the cell 

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 8,8;
.BOXA

⊂ααααα⊃            ⊂αααααα⊃
~↓  X ~		by ~*   #αβ→ to ∞3(A B C)∞*
%ααααα$            %αααααα$
.BOXB
.END
.FP
since %3z%* would get justifiably upset. What we will do is use the
forwarding address scheme we introduced on {yon(P173)} in the compacting
garbage collector.
We put a forwarding address in the cell referenced by %3x%*;
then allocate a new pair of half-cells, putting %3F%* in the first and a pointer to
%3(A#B#C)%* in the second.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 6,6
.BOXA
   x	⊂ααααααααα⊃     ⊂αααααααα⊃   	⊂αααααααα⊃
   %→αα→~i      #αβαααα→~↓     F ~ ⊂→αα→~↓     A ~
 	εαααααααααλ     εααααααααλ ↑	εααααααααλ
   z →α→~↓      X ~     ~*     #αβα$	~↓     B ~
	εαααααααααλ     %αααααααα$	εααααααααλ
	~/      Y ~  			~/     C ~
	%ααααααααα$ 			%αααααααα$
.BOXB
.END

.FP
These forwarding addresses are an instance of %2⊗→invisible pointers↔←%*
proposed by R. Greenblatt in his LISP machine; he has also proposed
implementing %3cdr%*-coding in hardware.
Between invisible pointers and 
%3cdr%*-coding, we %6can%* effect all LISP operations using this
potentially more compact representation. 

We  must be able to maintain that compact  representation while the program
is running. This requires more care in the  management of storage.
We cannot simply garbage collect and fragment space; we cannot 
use the simple compacting  garbage collector 
discussed in {yonss(P291)} since it does not
attempt to  maintain the compact representation. 
Several algorithms
with the desired properties exist (⊗↑[Che#70]↑, ⊗↑[Cla#76]↑).
One feature of this data representation is its  use of variable-sized
linked blocks of sequential storage. The management of these storage
blocks is more complex than that of simple dotted#pairs, but the
additional overhead may be  acceptable if it gives better locality of reference
and faster access to  list elements.⊗↓Notice that the %3cdr%1-coded representations
of %3(A#B)%1 and %3(A#.#B)%1  are equally expensive. In the typical linked-list
representation, %3(A#B)%1  requires more space than %3(A#.#B)%1.← 

There is less  conflict about the use of  more complex storage
management techniques in the area of LISP's dynamic implementation.
The original versions of LISP#1.5 used dotted pair structure to represent
the access environments.⊗↓The control information did use a stack implementation
coded in machine language.←  This generality allowed a correct solution to the
implementation of %3function%1, but  experience with LISP implementations
has shown that it is quite expensive to  maintain this generality
when most applications are of a less general nature.
Implementation techniques, patterned after our Weizenbaum
diagrams, allow some economies without loss of generality.
Again, storage would be allocated in sequential blocks; each block would
be of a size sufficient to hold the representation of the name-value entries
along with the additional areas to link the block to the environment.
The storage blocks need not be allocated sequentially; indeed, in the
general case blocks cannot be allocated sequentially. The de-allocation
problems are somewhat different from those experienced 
by data structure representations.
The environment structures are much more "well#behaved" than general list-structures.
Therefore an "environment garbage collector" may not be needed.

The most general techniques for management of LISP's dynamic
environment are based on ⊗↑[Bob#73a]↑ and succeeding papers.⊗↓There is something
contradictory about LISP implementors'  attitudes toward  storage
and  dynamics. Much effort  is  expended in attempting to
minimize the overhead involved in  the dynamic operation of LISP;
it is frequently stated that  users should not be penalized
for access/control constructs which they do  not use. However, that
attitude is not extended to LISP's data structures. There are very
generous subsets of LISP applications in which the data structure operations
are suitably well-behaved, so that storage reclamation techniques
less general than garbage collection are applicable. Analysis of this
area of LISP should lead to  profitable  results.← 

At a lower level of implementation, LISP has much to say about machine
organization. The implementation of efficient environment-swapping
algorithms is a problem which  any operating system must  face.
The traditional solutions impose   severe restrictions on  interprocess
communications. The  algorithms developed for LISP show promise
for giving efficient implementations of more general scope.

LISP's organization of memory also has lessons  for machine architecture. The
management of large variable-sized memory spaces like ⊗↑[Ste#73]↑ or ⊗↑[Wegb#70]↑
can be supported in hardware. The  allocation and de-allocation of such large
spaces also require care; LISP implementors have begun to address these problems
(⊗↑[Ste#76a]↑, ⊗↑[Bis#74a]↑).

Finally, the highly interactive nature of modern LISP programming
systems gives direction to efforts  developing  more "reactive"
programming systems (⊗↑[Win#75]↑).

.SS(Hash Techniques,,P248:)
.FP
One very significant problem experienced by a LISP programmer is the 
sheer size of data structures which a large LISP program generates.
Many LISP projects  approach 10%87%1 bits of program and data.
Several techniques have been  developed to help shrink  data representation;
%3cdr%1-coding ({yonss(P224)}) is one technique. Another technique stems
from the  observation that LISP tends to %6copy%1 structures rather
than %6share%1 them. We know that  the sharing of structures  must be done
with great care if modification operations like %3rplaca%1 and %3rplacd%1
are present, but  sharing of structure can mean a significant saving in space.
In fact, the saving can also be  reflected in the algorithms which manipulate
the structures. For example  if every list#structure is stored uniquely,
then the time for the equality test %3equal%1 is a constant rather than being
proportional to the  depth of the structure.

There are two alternatives for maintaining unique structure:
either maintain list space such that unique representations are always 
present  or supply an algorithm which will "uniquize" structures upon
request. The first alternative is usually called %2⊗→hash consing↔←%1; the
second technique is called %2list condensation%1#(⊗↑[Lin#73]↑).
A condensation algorithm must remove all duplicated structure from within
a list. Since condensation is a component of many  hashed LISP implementations,
we will concentrate our attention on hash consing.

Hash consing is an extension of the  LISP technique for generating
unique atoms.  Since
list structure is created only by the %3cons%1 operation,⊗↓However, list structure
may be modified by %3rplaca%1 and %3rplacd%1.←
we place the responsibility for maintaining  unique structure on %3cons%1.
If the result of a pending %3cons%1 is already present, we
return a pointer to that structure, otherwise we perform the %3cons%1
and record the result  so that it will be retrieved if the same
%3cons%1 happens again. The adjective "hash" is applied to this version of
%3cons%1 since the typical implementation uses a hashing algorithm to
maintain the uniqueness.
Hash %3cons%1ing imposes restrictions on the
programmer's use of list modification operations. If unique copies
are available, severe difficulties result if modifications are made.
One  either may   disallow list modification  or may supply additional operations
to copy structure, modify it, and "uniquize" the result. Or  an implementation may
supply different kinds of structures, some modifiable  and some not modifiable.

A hash %3cons%1 was proposed for LISP 1.75 on the IBM#M44, but the
implementation was never
completed. A limited version of hash %3cons%1ing was implemented as
an extension of LISP 1.6 at Stanford.

The most impressive and extensive applications of hashing appear
in HLISP (⊗↑[Got#74]↑, ⊗↑[Ter#75]↑). That implementation of LISP
supplies two different kinds of structures: ordinary list structure
and "monocopy" structures. Operations are also supplied for
conversion between types. Extensive analysis of hashing effects
on algorithm performance has also been done (⊗↑[Got#76]↑).
HLISP also employs hashing in its implementation of property lists.
Property lists are not stored explicitly, but rather the atom name and the 
property name are used to form a hash index; the property value is associated
with that hash index.
For example,
%3get[x;i]%1 hashes  with both  %3x%1 and %3i%1 to  retrieve  the property value.

The other major implementations of LISP also
offer specialized operations for dealing with hashed  quantities;
see ⊗↑[Moo#74]↑, ⊗↑[Int#75]↑, and ⊗↑[Bob#75]↑.